Train Status
The Train Log contains comprehensive validation metrics:
- Self-validation Classification Metrics
- Cross Validation Classification Metrics
To view the train status, download the train log and check the metrics.
Self-validation Classification Metrics
The DRUID ChatBot Portal performs the self-validation with the full train set to evaluate and get the training model’s accuracy metrics.
The table below describes the self-validation general metrics.
Metric | Description |
---|---|
Accuracy (A) |
The accuracy counts the phrases correctly identified, as count and as percent in total number of tests. In the Train Set tab, each train phrase is used for testing. In Test Set tab, each test phrase is used for testing. Note: You might also consider True Match Single and True Match Second as positive tests because at run time the DRUID chatbot will present the user the intents found and ask for confirmation.
Accuracy = True Match / Count All |
Error (E) |
The Error counts the phrases for which the intent was not identified (Unknown) or was false identified (different from the known expected). In the Train Set tab, the expected intent is the intent where the phrases is set. For the Test Set tab, the authors provide the expected intent for each train phrase. |
True match (TM) | The True Match counts all the phrases where the expected intent is the expected one. The higher the value, the better.
In Train Set, the expected intent is the flow to which the phrase belongs. In Test Set, the authors provide the expected intent together with the phrase. |
True match single | The True Match counts all the phrases where there is a single intent identified and is the expected one. The higher the value, the better.
In Train Set, the expected intent is the flow to which the phrase belongs. In Test Set, the authors provide the expected intent together with the phrase. |
True match first (TP1) | There might be more intents identified for a phrase, with different confidence scores. The True Match Single counts the phrases for which there are multiple intents identified, and the expected intent has the highest score. |
True match second (TP2) |
There might be more intents identified for a phrase, with different confidence scores. The True Match Single counts the phrases for which there are multiple intents identified, and the expected intent has the second highest score. |
False match (FM) | The False Match counts all phrases where an identified intent is different from the expected one. The lower the value, the better.
In Train Set, the expected intent is the flow to which the phrase belongs. In Test Set, the authors provide the expected intent together with the phrase. |
True unknown (TU) | The True Unknown counts all the phrases with no intent identified and this was expected.
Note: This metric applies only in Test Set. |
False unknown (FU) | False Unknown counts the phrases where the expected intent is not found.
In the Train Set, you should have False Unknown = 0. In the Test Set, you can specify for each phrase the expected intent or leave empty when you expect not to find an intent. It is a good practice to include in your Test Set phrases where you do not expect an intent. Unknown intents are counted as False Unknown or True Unknown, based on the above rule. |
Count All | The number of phrases included in the test.
For the Train Set counts all training phrases from all the flows. In Test Set, it counts all the test phrases. |
Cross Validation Classification Metrics
The DRUID ChatBot Portal performs the cross validation with a part of the train set to evaluate and get the training model’s accuracy metrics.
The table below describes the cross-validation metrics.
Metric | Description |
---|---|
Micro accuracy average | The micro-average is the fraction of instances predicted correctly across all classes. It can be more useful than macro-average if class imbalance is suspected (i.e. one class has many more instances than the rest). |
Micro accuracies standard deviation | |
Micro accuracies confidence interval 95 | |
Macro accuracy average | The average accuracy at the class level. The accuracy for each class is computed and the macro-accuracy is the average of these accuracies. It gives the same weight to each class regardless the number of class instances contained in the data set. |
Macro accuracies standard deviation | |
Macro accuracies confidence interval 95 | |
Log loss average | Measures the performance of a classifier with respect to how much the predicted probabilities diverge from the true class label. A lower value indicates a better model. The perfect model predicts a probability of 1 for the true class and will have a log-loss of 0. |
Log loss standard deviation | |
Log loss confidence interval 95 | |
Log loss reduction average | The relative log-loss or reduction in information gain (a.k.a. RIG). It gives a measure of how much a model improves on a model that gives random predictions. A log-loss reduction closer to 1 indicates a better model. |
Log loss reduction standard deviation | |
Log loss reduction confidence interval 95 |